A Unified Method for Extracting Simple and Multiword Verbs with Valence Information and Application for Hungarian

نویسنده

  • Bálint Sass
چکیده

We present a method for extracting verbcentered constructions (VCCs) from corpora. In our framework, simple and multiword verbs, with or without valence are all VCCs. They are treated uniformly, from e.g. to breathe till e.g. to take something into consideration. In order to extract VCCs we represent the corpus as a sequence of clauses that contain a verb together with all its NP dependents. The method is a generalization of a former subcategorization frame extraction method. It is based on cumulative counting of frequent subframes: small frequency counts are inherited to one of the longest available subframes using random selection. The method nds out automatically the number of elements in VCCs; and it detects automatically whether a content word is integral part of the VCC (forming a multiword verb), or just the verb-dependent relation is important (forming a valence slot of the verb). Signi cance of our method lies in its capability to deal with multiword verbs and (their) valence simultaneously. The paper includes evaluation for Hungarian, we obtain precision values above 80% using nbest lists evaluation. The representation and the method is in essence language independent, it could be applied to other languages as well.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Uni ed Method for Extracting Simple and Multiword Verbs with Valence Information and Application for Hungarian

We present a method for extracting verbcentered constructions (VCCs) from corpora. In our framework, simple and multiword verbs, with or without valence are all VCCs. They are treated uniformly, from e.g. to breathe till e.g. to take something into consideration. In order to extract VCCs we represent the corpus as a sequence of clauses that contain a verb together with all its NP dependents. Th...

متن کامل

Multiword Verbs in WordNets

In this paper, we describe how wordnets treat multiword verbs. We pay special attention to the English and Hungarian wordnets and we argue that from a multilingual perspective it is recommended to store idioms and light verb constructions as a whole rather than listing their parts separately. In order to enhance their applicability in multilingual applications, a unified treatment should be app...

متن کامل

Assignment problem and its application in Nigerian institutions: Hungarian method approach

Assignment model is a powerful operations research techniques that can be used to solve assignment or allocation problem. This study applies the assignment model to the course allocation problem in Nigeria tertiary institution in order to maximize lecturers’ effectiveness. A well-structured questionnaire was used to obtain data from lecturers and solved with Hungarian method. The study revealed...

متن کامل

Romanian Valence Dictionary in XML Format

Valence dictionaries are dictionaries in which logical predicates (most of the times verbs) are inventoried alongside with the semantic and syntactic information regarding the role of the arguments with which they combine, as well as the syntactic restrictions these arguments have to obey. In this article we present the incipient stage of the project “Syntactic and semantic database in XML form...

متن کامل

Novel Unified Control Method of Induction and Permanent Magnet Synchronous Motors

Many control schemes have been proposed for induction motor and permanent magnet synchronous motor control, which are almost highly complex and non-linear. Also, a simple and efficient method for unified control of the electric moto are rarely investigated. In this paper, a novel control method based on rotor flux orientation is proposed. The novelties of proposed method are elimination of q-ax...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009